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Abstract 

Departing from traditional communication theory where decoding algorithms are assumed to perform without 
error, a system where noise perturbs both computational devices and communication channels is considered here. 
This paper studies limits in processing noisy signals with noisy circuits by investigating the effect of noise on 
' standard iterative decoders for low-density parity-check codes. Concentration of decoding performance around its 

, average is shown to hold when noise is introduced into message-passing and local computation. Density evolution 

equations for simple faulty iterative decoders are derived. In one model, computing nonlinear estimation thresholds 
^ shows that performance degrades smoothly as decoder noise increases, but arbitrarily small probability of error is 

not achievable. Probability of error may be driven to zero in another system model; the decoding threshold again 
, decreases smoothly with decoder noise. As an application of the methods developed, an achievability result for 

■ reliable memory systems constructed from unreliable components is provided. 

Index Terms 

Low-density parity-check codes, communication system fault tolerance, density evolution, decoding, memories 

O 

I. Introduction 

m 

j>. ■ The basic goal in channel coding is to design encoder-decoder pairs that allow rehable communication over noisy 
channels at information rates close to capacity HI. The primary obstacle in the quest for practical capacity-achieving 
^ codes has been decoding complexity ||2l-||4l. Low-density parity-check (LDPC) codes have, however, emerged as 
y—{ a class of codes that have performance at or near the Shannon limit |5j, Ii6j| and yet are sufficiently structured as 
^ . to have decoders with circuit implementations 171-191. 

O ' In addition to decoder complexity, decoder reliabihty may also limit practical channel codingQ In Shannon's 
schematic diagram of a general communication system IH Fig. 1] and in the traditional information and commu- 
. . nication theories that have developed within the confines of that diagram, noise is localized in the communication 
. ^ channel. The decoder is assumed to operate without error. Given the possibility of unreliable computation on faulty 
^ ■ hardware, there is value in studying error-prone decoding. In fact Hamming's original development of parity-check 
^ codes was motivated by applications in computing rather than in communication llTl . 
■ ■ ■ ' The goal of this paper is to investigate limits of communication systems with noisy decoders and has dual 
motivations. The first is the eminently practical motivation of determining how well error control codes work when 
decoders are faulty. The second is the deeper motivation of determining fundamental limits for processing unreliable 
signals with unreliable computational devices, illustrated schematically in Fig. [T] The motivations are intertwined. 
As noted by Pierce, "The down-to-earth problem of making a computer work, in fact, becomes tangled with this 
difficult philosophical problem: 'What is possible and what is impossible when unreliable circuits are used to 
process unreliable information?'" ifTll . 

A first step in understanding these issues is to analyze a particular class of codes and decoding techniques: iterative 
message-passing decoding algorithms for LDPC codes. When the code is represented as a factor graph, algorithm 
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'One may also consider the effect of encoder complexity |10|, however encoder noise need not be explicitly considered, since it may be 
incorporated into channel noise, using the noise combining argument suggested by Fig. [3] 
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RECEIVER DESTINATION 



Fig. 1. Schematic diagram of an information system that processes unreliable signals with unreliable circuits. 



computations occur at nodes and algorithm communication is carried out over edges. Correspondence between the 
factor graph and the algorithm is not only a tool for exposition but also the way decoders are implemented |7J-[9|. 
In traditional performance analysis, the decoders are assumed to work without error. In this paper, there will be 
transient local computation and message-passing errors, whether the decoder is analog or digital. 

When the decoder itself is noisy, one might believe that achieving arbitrarily small probability of error (Shannon 
reliability) is not possible, but this is indeed possible for certain sets of noisy channels and noisy decoders. This is 
shown by example. For other sets of noisy channels and noisy decoders. Shannon reliability is not achievable, but 
error probability tending to extremely small values is achievable. Small probability of error, t], is often satisfactory in 
practice, and so r/-reliable performance is also investigated. Decoding thresholds at ?7-reliability decrease smoothly 
with increasing decoder noise. Communication systems may display graceful degradation with respect to noise 
levels in the decoder. 

The remainder of the paper is organized as follows. Section JI] reviews motivations and related work. Section Hill 
formalizes notation and Section |IV] gives concentration results that allow the density evolution method of analysis, 
generalizing results in 1.13,1 . A noisy version of the Gallager A decoder for processing the output of a binary 
symmetric channel is analyzed in Section |Vl where it is shown that Shannon reliability is unattainable. In Section IVll 
a noisy decoder for AWGN channels is analyzed. For this model, the probability of error may be driven to zero 
and the decoding threshold degrades smoothly as a function of decoder noise. As an application of the results of 
Section |Vl Section IVIII precisely characterizes the information storage capacity of a memory built from unreliable 
components. Section IVIIII provides some conclusions. 

II. Background 

A. Practical Motivations 

Although always present ifTTl . lfT4l . recent technological trends in digital circuit design bring practical motivations 
to the fore lfT5l - |[T7il . The 2008 update of the International Technology Roadmap for Semiconductors (ITRSjl points 
out that for complementary metal-oxide-siUcon (CMOS) technology, increasing power densities, decreasing supply 
voltages, and decreasing sizes have increased sensitivity to cosmic radiation, electromagnetic interference, and 
thermal fluctuations. The ITRS further says that an ongoing shift in the manufacturing paradigm will dramatically 
reduce costs but will lead to more transient failures of signals, logic values, devices, and interconnects. Device 
technologies beyond CMOS, such as single-electron tunnelling technology ifTSl . carbon-based nanoelectronics ||T9l . 
and chemically assembled electronic nanocomputers 11201 . are also projected to enter production, but they all display 
erratic, random device behavior [2]J, 1,22 J . 

Analog computations are always subject to noise ||23l . Il24l . Similar issues arise when performing real-valued 
computations on digital computers since quantization, whether fixed-point or floating-point, is often well-modeled 
as bounded, additive stochastic noise |[25l . 



B. Coding and Computing 

Information and communication theory have provided limits for processing unreliable signals with reliable circuits 
ID 5 El, ll26l . whereas fault-tolerant computing theory has provided limits for processing reliable signals (inputs) 
with unreliable circuits llT2l . jZTl - llSTl . This work brings the two together. 

^The overall objective of the ITRS is to present the consensus of the semiconductor industry on the best current estimate of research and 
development needs for the next fifteen years. 
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A brief overview of terms and concepts from fault-tolerant computing, based on ||32l . ||33l . is now provided. 
A fault is a physical defect, imperfection, or flaw that occurs within some hardware or software component. An 
error is the informational manifestation of a fault. A permanent fault exists indefinitely until corrective action 
is taken, whereas a transient fault appears and disappears in a short period of time. Noisy circuits in which the 
interconnection pattern of components are trees are called /orwiM/a^ |[34l . |[35l . 

In an error model, the effects of faults are given directly in the informational universe. For example, the basic 
von Neumann model of noisy circuits |27| models transient faults in logic gates and wires as message and node 
computation noise that is both spatially and temporally independent; this has more recently also been called the 
Hegde-Shanbhag model |[36ll . after |[37l . This error model is used here. Error models of permanent faults |[38l . 
f39] or of miswired circuit interconnection 1281 . BOl have been considered elsewhere. Such permanent errors in 
decoding circuits may be interpreted as either changing the factor graph used for decoding or as introducing new 
potentials into the factor graph; the code used by the encoder and the code used by the decoder are different. 

There are several design philosophies to combat faults. Fault avoidance seeks to make physical components more 
reliable. Fault masking seeks to prevent faults from introducing errors. Fault tolerance is the ability of a system to 
continue performing its function in the presence of faults. This paper is primarily concerned with fault tolerance, 
but Section IVlIl considers fault masking. 

C. Related Work 

Empirical characterizations of message-passing decoders have demonstrated that probability of error performance 
does not change much when messages are quantized at high resolution [26]. Even algorithms that are coarsely 
quantized versions of optimal belief propagation show little degradation in performance 1 13J, [4T |-[46]. It should be 
emphasized, however, that fault-free, quantized decoders differ significantly from decoders that make random errorsj^ 
The difference is similar to that between control systems with finite-capacity noiseless channels and control systems 
with noisy channels of equal capacity [50|. Seemingly the only previous work on message -passing algorithms with 
random errors is [51], which deals with problems in distributed inference^ 

The information theoretic problem of mismatch capacity |[52l and its analog for iterative decoding |53| deal 
with scenarios where an incorrect decoding metric is used. This may arise, e.g., due to incorrect estimation of the 
channel noise power. For message-passing decoding algorithms, mismatch leads to incorrect parameters for local 
computations. These are permanent faults rather than the kind of transient faults considered in this paper. 

Noisy LDPC decoders were previously analyzed in the context of designing reliable memories from unreliable 
components 1541 . 1551 (revisited in Section lVlIl ). using Gallager's original methods [26|. Several LPDC code analysis 
tools have since been developed, including simulation [56|, expander graph arguments |57|, |58], EXIT charts 
1591 . l60l . and density evolution lT3l . l6ll . l62l . This work generalizes asymptotic characterizations developed by 
Richardson and Urbanke for noiseless decoders lT3l . showing that density evolution is applicable to faulty decoders. 
Expander graph arguments have also been extended to the case of noisy decoding in a paper [63] that appeared 
concurrently with the first presentation of this work |64|. Note that previous works have not even considered the 
possibility that Shannon reliability is achievable with noisy decoding. 

III. Codes, Decoders, and Performance 

This section establishes the basic notation of LDPC channel codes and message-passing decoders for communi- 
cation systems depicted in Fig. [1] It primarily follows established notation in the field [13], [65], and will therefore 
be brief. Many of the notational conventions are depicted schematically in Fig. |2] using a factor graph-based decoder 
implementation. 

Consider the standard ensemble of (dv, c?c) -regular LDPC codes of length n, C^{dv,dc), defined by a uniform 
measure on the set of labeled bipartite factor graphs with variable node degree d^ and check node degree dcE There 

' Randomized algorithms |47| and stochastic computation 1481 (used for decoding in |49|) make use of randomness to increase functionality, 
but the randomness is deployed in a controlled manner. 

''if the graphical model of the code and the graph of noisy communication links in a distributed system coincide, then the distributed 
inference problem and the message-passing decoding problem can be made to coincide. 

'a factor graph determines an "ordered code," but the opposite is not true i66i . Moreover, since codes are unordered objects, several 
"ordered codes" are in fact the same code. 
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Fig. 2. Schematic diagram of a factor graph-based implementation of a noisy decoder circuit. Only one variable-to-check message and one 
check-to-variable message are highlighted. Other wires, shown in gray, will also carry noisy messages. 



are n variable nodes corresponding to the codeword letters and nd^/dc check nodes corresponding to the parity 
check constraints. The design rate of the code is 1 — d^/dc, though the actual rate might be higher since not all 
checks may be independent; the true rate converges to the design rate for large n ll65l Lemma 3.22]. One may also 
consider irregular codes, C^{X,p) characterized by the degree distribution pair {X,p). Generating functions of the 
variable node and check node degree distributions, A(C) and p{C), are functions of the form X{() = J2i^2 '^iC'^ 
and p{() = J2i^2PiC~^' where Aj and pi specify the fraction of edges that connect to nodes with degree i. The 
design rate is 1 - p{C)d(/ X{Od(. 

In the communication system of Fig. [T] a codeword is selected by the transmitter and is sent through the noisy 
channel. Channel input and output letters are denoted X ^ X and Y ^ y. Since binary linear codes are used, 
X can be taken as {±1}. The receiver contains a noisy message-passing decoder, which is used to process the 
channel output codeword to produce an estimate of X that is denoted X. The goal of the receiver is to recover the 
channel input codeword with low probability of error. Throughout this work, probability of bit error Pg is used as 
the performance criterion |f| 

Pe = Fr[X + X\. 

The message-passing decoder works in iterative stages and the iteration time is indexed by £ = 0, 1, . . .. Within 
the decoder, at time ^ = 0, each variable node has a realization of Y , yi. A message-passing decoder exchanges 
messages between nodes along wires. First each variable node sends a message to a neighboring check node over 
a noisy messaging wire. Generically, sent messages are denoted as v^^c, message wire noise realizations as Wy-s-o 
and received messages as p^^c- assume without loss of generality that v^^c, Wv^c, and /iv^c are drawn from a 
common messaging alphabet Ai. 

Each check node processes received messages and sends back a message to each neighboring variable node 
over a noisy message wire. The noisiness of the check node processing is generically denoted by an input random 
variable Uc G U. The check node computation is denoted <I>(^) : M-'^'^^ xU ^ Ai. The notations z^c->v> fJ-c^v, 
and Wc-fv are used for signaling from check node to variable node; again without loss of generality assume that 

T^c^Y,Wc^^,Pc^^ G M. 

Each variable node now processes its yi and the messages it receives to produce new messages. The new messages 
are produced through possibly noisy processing, where the noise input is generically denoted G U. The variable 
node computation is denoted ^^^^ : y x Jii'^^^^ x U ^ M.. Local computations and message-passing continue 
iteratively. 

'An alternative would be to consider block error probability, however an exact evaluation of this quantity is difficult due to the dependence 
between different symbols of a codeword, even if the bit error probability is the same for all symbols in the codeword 1671 . 
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Message passing induces decoding neighborhoods, which involve nodes/wires that have communicated with one 
another. For a given node h, its neighborhood of depth d is the induced subgraph consisting of all nodes reached 
and edges traversed by paths of length at most d starting from h (including n) and is denoted A/"^. The directed 
neighborhood of depth d of a wire v — )• c, denoted by N^^^, is defined as the induced subgraph containing all 
wires and nodes on paths starting from the same place as v — c but different from v — > c. Equivalently for a 
wire c — )• V, M^_^^ is the induced subgraph containing all wires and nodes on paths starting from the same place 
as c — V but different from c — v. If the induced subgraph (corresponding to a neighborhood) is a tree then the 
neighborhood is tree-like, otherwise it is not tree-like. The neighborhood is tree-like if and only if all involved 
nodes are distinct. 

Note that only extrinsic information is used in node computations. Also note that in the sequel, all decoder noises 
{Uc, Uv, W^-^c, and Wc->v) will be assumed to be independent of each other, as in the von Neumann enw model 
of faulty computing. 

A communication system is judged by information rate, error probability, and blocklength. For fixed channels, 
information theory specifies the limits of these three parameters when optimizing over the unconstrained choice of 
codes and decoders; Shannon reliability is achievable for rates below capacity in the limit of increasing blocklength. 
When decoders are restricted to be noisy, tighter information theoretic limits are not known. Therefore comparing 
performance of systems with noisy decoders to systems using identical codes but noiseless decoders is more 
appropriate than comparing to Shannon limits. 

Coding theory follows from information theory by restricting decoding complexity; analysis of noisy decoders 
follows from coding theory by restricting decoding reliability. 

IV. Density Evolution Concentration Results 

Considering the great successes achieved by analyzing the noiseless decoder performance of ensembles of codes 
|[T3l . 161], rather than of particular codes [26], the same approach is pursued for noisy decoders. The first 
mathematical contribution of this work is to extend the method of analysis promulgated in |fT3l to the case of 
decoders with random noise. 

Several facts that simplify performance analysis are proven. First, under certain symmetry conditions with wide 
applicability, the probability of error does not depend on which codeword is transmitted. Second, the individual 
performances of codes in an ensemble are, with high probability, the same as the average performance of the 
ensemble. Finally, this average behavior converges to the behavior of a code defined on a cycle-free graph. 
Performance analysis then reduces to determining average performance on an infinite tree: a noisy formula is 
analyzed in place of general noisy circuits. 

For brevity, only regular LDPC codes are considered in this section, however the results can be generalized to 
irregular LDPC codes. In particular, replacing node degrees by maximum node degrees, the proofs stand mutatis 
mutandis. Similarly, only binary LDPC codes are considered; generalizations to non-binary alphabets also foUow, 
as in \mt. 

A. Restriction to All-One Codeword 

If certain symmetry conditions are satisfied by the system, then the probability of error is conditionally indepen- 
dent of the codeword that is transmitted. It is assumed throughout this section that messages in the decoder are in 
belief format. 

Definition 1: A message in an iterative message -passing decoder for a binary code is said to be in belief format if 
the sign of the message indicates the bit estimate and the magnitude of the message is an increasing function of the 
confidence level. In particular, a positive- valued message indicates belief that a bit is +1 whereas a negative- valued 
message indicates belief that a bit is —1. A message of magnitude indicates complete uncertainty whereas a 
message of infinite magnitude indicates complete confidence in a bit value. 

Note, however, that it is not obvious that this is the best format for noisy message -passing |[65l Appendix B.l]. 
The symmetry conditions can be restated for messages in other formats. 
The several symmetry conditions are: 

Definition 2 (Channel Symmetry): A memory less channel is binary-input output-symmetric if it satisfies 

p{Yt = y\Xt = 1) = p(Yt = -y\Xt = -1) 
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for all channel usage times t = 1, . . . ,n. 

Definition 3 (Check Node Symmetry): A check node message map is symmetric if it satisfies 

for any ±1 sequence (hi, . . . ,bdj- That is to say, the signs of the messages and the noise factor out of the map. 
Definition 4 (Variable Node Symmetry): A variable node message map is symmetric if it satisfies 

M/W(-/io,-n) = -*W(/.o,n) 

and 

^^^H-/^o, -Ml, • • • , -fJ-d^-i, -u) = -^(^)(/io, • • • , fJ-d,-i,u), 

for i > 1. That is to say, the initial message from the variable node only depends on the received value and internal 
noise and there is sign inversion invariance for all messages. 

Definition 5 (Message Wire Symmetry): Consider any message wire to be a mapping E : ^A x ^A ^ ^A. Then 
a message wire is symmetric if 

fi = E{iy, w) = -w), 

where /i is any message received at a node when the message sent from the opposite node is v and w is message 
wire noise with distribution symmetric about 0. 

An example where the message wire symmetry condition holds is if the message wire noise w is additive and 
symmetric about 0. Then fi = u + w = —{—u — w) and w is symmetric about 0. 

Theorem 1 ( Conditional Independence of Error): For a given binary linear code and a given noisy message- 
passing algorithm, let pj^^ (x) denote the conditional probability of error after the £th decoding iteration, assuming 
that codeword x was sent. If the channel and the decoder satisfy the symmetry conditions given in Definitions |2]-l5l 
then Pe^^ (x) does not depend on x. 

Proof: Modification of |[T3l Lemma 1] or ll65l Lemma 4.92]. Appendix lAl gives details. ■ 

Suppose a system meets these symmetry conditions. Since probability of error is independent of the transmitted 
codeword and since all LDPC codes have the all-one codeword in the codebook, one may assume without loss 
of generality that this codeword is sent. Doing so removes the randomness associated with transmitted codeword 
selection. 



B. Concentration around Ensemble Average 

The next simplification follows by seeing that the average performance of the ensemble of codes rather than 
the performance of a particular code may be studied, since all codes in the ensemble perform similarly. The 
performances of almost all LDPC codes closely match the average performance of the ensemble from which they 
are drawn. The average is over the instance of the code, the realization of the channel noise, and the realizations 
of the two forms of decoder noise. To simplify things, assume that the number of decoder iterations is fixed at 
some finite I. Let Z be the number of incorrect values held among all d^n variable node-incident edges at the end 
of the Ah iteration (for a particular code, channel noise realization, and decoder noise realization) and let E [Z] be 
the expected value of Z. By constructing a martingale through sequentially revealing all of the random elements 
and then using the Hoeffding-Azuma inequality, it can be shown that: 

Theorem 2 (Concentration Around Expected Value): There exists a positive constant /3 = f3{d^,dc,i) such that 
for any e > 0, 

Fr[\Z - E[Z]\> nd^e/2] < 26"'^^'". 

Proof: Follows the basic ideas of the proofs of |13, Theorem 2] or [65, Theorem 4.94]. Appendix IB] gives 
details. ■ 
A primary communication system performance criterion is probability of error Pf.; if the number of incorrect 
values Z concentrates, then so does Pg- 
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C. Convergence to the Cycle-Free Case 

The previous theorem showed that the noisy decoding algorithm behaves essentially deterministically for large n. 
As now shown, this ensemble average performance converges to the performance of an associated tree ensemble, 
which will allow the assumption of independent messages. 

For a given edge whose directed neighborhood of depth 21 is tree-like, let p be the expected number of incorrect 
messages received along this edge (after message noise) at the ^th iteration, averaged over all graphs, inputs and 
decoder noise realizations of both types. 

Theorem 3 (Convergence to Cycle-Free Case): There exists a positive constant 7 = ^{dy,dc,l) such that for 
any e > and n > 27/e, 

\E [Z] — nd^p\ < nd^e/2. 

The proof is identical to the proof of [13, Theorem 2]. The basic idea is that the computation tree created by 
unwrapping the code graph to a particular depth [69] almost surely has no repeated nodes. 

The concentration and convergence results directly imply concentration around the average performance of a tree 
ensemble: 

Theorem 4 (Concentration Around Cycle-Free Case): There exist positive constants /3 = /3{dv,dc,i) and 7 = 
7((iv! dc,i) such that for any e > and n > 27/e, 

Pt[\Z - nd^p\ > nd^e] < 26"'^''". 

Proof: Follows directly from Theorems |2] and [3] ■ 



D. Density Evolution 

With the conditional independence and concentration results, all randomness is removed from explicit consider- 
ation and all messages are independent. The problem reduces to density evolution, the analysis of a discrete-time 
dynamical system ll62l . The dynamical system state variable of most interest is the probability of bit error, P^. 



Denote the probability of bit error of a code g £ after £ iterations of decoding by Pe^\g, e, a), where e is a 
channel noise parameter (such as noise power or crossover probability) and a is a decoder noise parameter (such 
as logic gate error probability). Then density evolution computes 



lim E 

n— )-oo 



where the expectation is over the choice of the code and the various noise realizations. The main interest is in 
the long-term behavior of the probability of error after performing many iterations. The long-term behavior of a 
generic dynamical system may be a limit cycle or a chaotic attractor, however density evolution usually converges 
to a stable fixed point. Monotonicity (either increasing or decreasing) with respect to iteration number i need not 
hold, but it often does. If there is a stable fixed point, the limiting performance corresponds to 



77* = lim lim E 



In channel coding, certain sets of parameters {g,e,a) lead to "good" performance, in the sense of small t]*, 
whereas other sets of parameters lead to "bad" performance with large rj*. The goal of density evolution analysis 
is to determine the boundary between these good and bad sets. 

Though it is natural to expect the performance of an algorithm to improve as the quality of its input improves 
and as more resources are allocated to it, this may not be so. For many decoders, however, there is a monotonicity 
property that limiting behavior rj* improves as channel noise e decreases and as decoder noise a decreases. Moreover, 
just as in other nonlinear estimation systems for dimensionality-expanding signals 11701 - 11721 . there is a threshold 
phenomenon such that the limiting probability of error may change precipitously with the values of e and a. 

In traditional coding theory, there is no parameter a, and the goal is often to determine the range of e for which 
r]* is zero. The boundary is often called the decoding threshold and may be denoted e*{r]* = 0). A decoding 
threshold for optimal codes under optimal decoding may be computed from the rate of the code g and the capacity 
of the channel as a function of e, C{e). Since this Shannon limit threshold is for optimal codes and decoders, it 
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Fig. 3. Local computation noise may be incorporated into message-passing noise witiiout essential loss of generality. 



is clearly an upper bound to e*(0) for any given code and decoder. If the target error probability 77* is non-zero, 
then the Shannon limit threshold is derived from the so-called ?7*-capacity, jz^^^^-ry, rather than C(e)|Zl 

In the case of faulty decoders, the Shannon limits also provide upper bounds on the e-boundary for the set of 
(e, a) that achieve good performance. One might hope for a Shannon theoretic characterization of the entire {e, a)- 
boundary, but as noted previously, such results are not extant. Alternately, in the next sections, sets of (e, a) that 
can achieve ry* -reliability for particular LDPC codes g ^ are characterized using the density evolution method 
developed in this section. 

V. Example: Noisy Gallager A Decoder 

Section |IV] showed that density evolution equations determine the performance of almost all codes in the large 
blocklength regime. Here the density evolution equation for a simple noisy message -passing decoder, a noisy version 
of Gallager's decoding algorithm A |[26l . fT4], is derived. The algorithm has message alphabet Ai = {±1}, with 
messages in belief format simply indicating the estimated sign of a bit. Although this simple decoding algorithm 
cannot match the performance of belief propagation due to its restricted messaging alphabet 7W, it is of interest 
since it is of extremely low complexity and can be analyzed analytically fT4]. 

Consider decoding the LDPC-coded output of a binary symmetric channel (BSC) with crossover probability e. 
At a check node, the outgoing message along edge e is the product of all incoming messages excluding the one 
incoming on e, i.e. the check node map ^ is the XOR operation. At a variable node, the outgoing message is the 
original received code symbol unless all incoming messages give the opposite conclusion. That is, 

^ ^ I -y> if /ii = • • • = fid,~i = -y, 

1 y, otherwise. 

There is no essential loss of generality by combining computation noise and message -passing noise into a single 
form of noise, as demonstrated schematically in Fig. [3] and proven in |75, Lemma 3.1]. This noise combining is 
performed in the sequel to reduce the number of decoder noise parameters and allow a clean examination of the 
central phenomenon. Thus, each message in the Gallager algorithm A is passed over an independent and identical 
BSC wire with crossover probability a. 

The density evolution equation leads to an analytic characterization of the set of (e, a) pairs, which parameterize 
the noisiness of the communication system. 

A. Density Evolution Equation 

The density evolution equation is developed for general irregular LDPC ensembles. The state variable of density 
evolution, S£, is taken to be the expected probability of bit error at the variable nodes in the large blocklength limit, 
denoted here as Pe^\e,a). 

The original received message is in error with probability e, thus 

pW(e,a) = so = e. 

The initial variable-to-check message is in error with probability (1 — e)a + e(l — a), since it is passed through a 
BSC(a). For further iterations, £, the probability of error, (e, a), is found by induction. Assume Pe^\e, a) = Si 

^The function /i2( ) is the binary entropy function. The 77* -capacity expression is obtained by adjusting capacity by the rate-distortion 
function of an equiprobable binary source under frequency of error constraint 77*, R{ri*) = 1 — /i2(»?*) 1731 . 
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for < i < Now consider the error probability of a check-to-variable message in the + l)th iteration. A 
check-to-variable message emitted by a check node of degree dc along a particular edge is the product of all the 
{dc — 1) incoming messages along all other edges. By assumption, each such message is in error with probability 
si and all messages are independent. These messages are passed through BSC(a) before being received, so the 
probability of being received in error is 

S£(l — a) + (1 — S£)a = a + S£ — 2as£. 

Due to the XOR operation, the outgoing message will be in error if an odd number of these received messages 
are in error. The probability of this event, averaged over the degree distribution, yields the probability 

1- p[l-2{a + Si - 2asi)] 
2 ■ 

Now consider Pe^^^\e, a), the error probability at the variable node in the {£ + l)th iteration. Consider an edge 
which is connected to a variable node of degree d^. The outgoing variable-to-check message along this edge is in 
error in the {£ + l)th iteration if the original received value is in error and not all incoming messages are received 
correctly or if the originally received value is correct but all incoming messages are in error. The first event has 
probability 



e l 



1- (1 



a 



1 - p[l -2{a + se- 2asi)] 



a 



l+p[l-2{a + se- 2as£)] 



d,-r 



The second event has probability 



il-e) 



I- p[l-2{a + se-2ase)]\ f 1 + p[l - 2{a + si - 2asi)] 
[1 - a) { I + a ' 



Averaging over the degree distribution and adding the two terms together yields the density evolution equation 
in recursive form: 



The expressions 



Si+i = £- eq^{se) + (1 - (s^). 
+ (~\ \ \^ + P^^o,{s)) - 2ap{uo,{s)) 



(1) 



(la (s) = A 



1 - p{uJa{s)) + 2ap[Ua{s)) 



and ijJa.{s) = (2a — l)(2s — 1) are used to define the density evolution recursion. 



B. Performance Evaluation 

With the density evolution equation established, the performance of the coding-decoding system with particular 
values of quaUty parameters e and a may be determined. Taking the bit error probability as the state variable, 
stable fixed points of the deterministic, discrete-time, dynamical system are to be found. Usually one would want 
the probability of error to converge to zero, but since this might not be possible, a weaker performance criterion 
may be needed. To start, consider partially noiseless cases. 

1) Noisy Channel, Noiseless Decoder: For the noiseless decoder case, i.e. a = 0, it has been known that there 
are thresholds on e, below which the probability of error goes to zero as i increases, and above which the probabiUty 
of error goes to some large value. These can be found analytically for the Gallager A algorithm |[74l . 
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2) Noiseless Channel, Noisy Decoder: For the noisy Gallager A system under consideration, the probabiUty of 
error does not go to zero as I goes to infinity for any a > 0. This can be seen by considering the case of the 
perfect original channel, e = 0, and any a > 0. The density evolution equation reduces to 

si+i = q~{si), (2) 

with So = 0. The recursion does not have a fixed point at zero, and since error probability is bounded below by 
zero, it must increase. The derivative is 

1 - p{u:a{s)) + 2ap{uja{s)) 



i2 



which is greater than zero for < s < ^ and < a < ^ ; thus the error evolution forms a monotonically increasing 
sequence. Since the sequence is monotone increasing starting from zero, and there is no fixed point at zero, it 
follows that this converges to the smallest real solution of s = q~{s) since the fixed point cannot be jumped due 
to monotonicity. 

3 ) Noisy Channel, Noisy Decoder: The same phenomenon must also happen if the starting sq is positive, however 
the value to which the density evolution converges is a non-zero fixed point solution of the original equation ([TJ, 
not of (111), and is a function of both a and e. Intuitively, for somewhat large initial values of e, the noisy decoder 
decreases the probability of error in the first few iterations, just like the noiseless one, but when the error probability 
becomes close to the internal decoder error, the probability of error settles at that level. This is summarized in the 
following proposition. 

Proposition 1: Final error probability 77* > for any LDPC ensemble decoded using the noisy Gallager A 
system defined in Section |Vl for every decoder noise level a > and every channel noise level e. □ 

The fact that probability of error cannot asymptotically be driven to zero with the noisy Gallager decoder is 
expected yet is seemingly displeasing. In a practical scenario, however, the ability to drive Pe to a very small 
number is also desirable. As such, a performance objective of achieving Pg less than rj is defined and the worst 
channel (ordered by e) for which a decoder with noise level a can achieve that objective is determined. The channel 
parameter 

e*{-n,a) = sup{e G [0, i] | lim PP{g,e,a) < r]} 

is called the threshold. For a large interval of rj values, there is a single threshold value below which ?7-reliable 
communication is possible and above which it is not. Alternatively, one can determine the probability of error to 
which a system with particular a and e can be driven, r?*(a, e) = lim£_s.oo Pe^\ and see whether this value is small. 

In order to find the threshold in the case of a > and e > 0, the real fixed point solutions of density evolution 
recursion ([T]) need to be found. The real solutions of the polynomial equation in s, 

£ - £qt{s) + (1 - e)qa{s) - s = 

are denoted < ri(a,e) < r2{a,e) < ?'3(a, e) < • • • H The final probability of error rj* is determined by the r^, 
since these are fixed points of the recursion ([T]). 
The real solutions of the polynomial equation in s, 

1 +7~\ ^ - s = 0, (3) 

1 - Qa[S) - Qa [S) 

are denoted < ri(a) < T2(a) < • • • .'^The threshold e* as well as the region in the a — e plane where the decoder 
improves performance over no decoding are determined by the r^, since (|3]l is obtained by solving recursion ^ for 
e and setting equal to zero. For particular ensembles of LDPC codes, these values can be computed analytically. 
For these particular ensembles, it can be determined whether the fixed points are stable or unstable. Moreover, 
various monotonicity results can be established to show that fixed points cannot be jumped. 

Analytical expressions for the ri{a,e) and Ti{a) are determined for the (3,6) regular LDPC code by solving the 
appropriate polynomial equations and numerical evaluations of the rj expressions are shown as thin lines in Fig. |4] 
as functions of e for fixed a. The point where ri(a,e) = e is ri(a) and the point where r2{a,e) = e is T2(a). In 
Fig. m these are points where the thin lines cross. 

^The number of real solutions can be determined through Descartes' rule of signs or a similar tool |76|. 
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Fig. 4. Thick line shows final error probability, r;*, after decoding a C°°(3, 6) code with the noisy Gallager A algorithm, q = 0.005. This 
is determined by the fixed points of density evolution, ri{a,e), shown with thin lines. 



By analyzing the dynamical system equation ([T]) for the (3,6) code in detail, it can be shown that ri{a,e) and 
r^ia, e) are stable fixed points of density evolution. Conti'arily, r2(a, e) is an unstable fixed point, which determines 
the boundary between the regions of attraction for the two stable fixed points. Since ri(a, e) and r3(a, e) are stable 
fixed points, the final error probability t]* will take on one of these two values, depending on the starting point of 
the recursion, e. The thick line in Fig. |4] shows the final error probability rj* as a function of initial error probability 
e. One may note that rj* = ri is the desirable small error probability, whereas rj* = is the undesirable large 
error probability and that T2 delimits these two regimes. 

The r(Q) points determine when it is beneficial to use the decoder, in the sense that rj* < e. By varying a (as 
if in a sequence of plots like Fig. lU, an a — e region where the decoder is beneficial is demarcated; this is shown 
in Fig. |5] The function T2 (a) is the ?7-reliability decoding threshold for large ranges of tj. 

Notice that the previously known special case, the decoding threshold of the noiseless decoder, can be recovered 
from these results. The decoding threshold for the noiseless decoder is denoted e^^[/ and is equal to the following 
expression |f74l|. 

^BRU — 2 ' 

where 



1 y~U~" V ^ 4V-5/12-6 

<7 = — 7 + ^ h :r 




4 
and 

1 

1 / 83 + 3^/993 \ 3 
3V 2 J 

This value is recovered from noisy decoder results by noting that r]*{a = 0,e) = for e € [0,e^j:j(7], which are 
the ordinate intercepts of the region in Fig. [5] 

To provide a better sense of the performance of the noisy Gallager A algorithm, Table IH lists some values of a, 
e, and rj* (numerical evaluations are listed and an example of an analytical expression is given in Appendix 0. 
As can be seen from these results, particularly from the T2 curve in Fig. |5j the error probability performance of 
the system degrades gracefully as noise is added to the decoder. 

Returning to threshold characterization, an analytical expression for the threshold within the region to use decoder 

is: 

e*{r] a) = 

which is the solution to the polynomial equation in e, 

e - eg+(r/) + (1 - e)g-(r?) - r? = 0. 
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Fig. 5. Decoding a C°°(3, 6) code with the noisy Gallager A algorithm. Region where it is beneficial to use decoder is below T2 and above 
n. 



TABLE I 

Performance of Noisy Gallager A algorithm for (3,6) code 



Q 


e*(0.1,Q) 


r;*(a,e*) 


r;* (a, 0.01) 





0.0394636562 








1 X 10""' 


0.0394636560 


7.8228 X 10"'' 


1.3333 X 10"'' 


1 X lO"'' 


0.0394636335 


7.8228 X lO"'-* 


1.3333 X IQ-'' 


1 X 10"" 


0.0394613836 


7.8234 X 10"' 


1.3338 X 10" ' 


1 X 10"* 


0.0392359948 


7.8866 X 10"" 


1.3812 X 10"'^ 


3 X 10"* 


0.0387781564 


2.4050 X 10"* 


4.4357 X 10"=' 


1 X 10"^ 


0.0371477336 


8.4989 X 10"* 


1.8392 X 10"* 


3 X 10"'' 


0.0321984070 


3.0536 X 10"^ 


9.2572 X 10"* 


5 X 10"'' 


0.0266099758 


6.3032 X 10"'' 


2.4230 X 10"'' 



The threshold is drawn for several values of -q in Fig. |6] A threshold line determines the equivalence of channel 
noise and decoder noise with respect to final probability of error. If for example, the binary symmetric channels 
in the system are a result of hard-detected AWGN channels, such a line may be used to derive the equivalent 
channel noise power for decoder noise power or vice versa. Threshold lines therefore provide guidelines for power 
allocation in communication systems. 
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= 0.001 




= 0.0005 
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= 0.00005 




= 0.00001 




Fig. 6. r;-thresholds (gray lines) for decoding a C°°(3,6) code with the noisy Gallager A algorithm within the region to use decoder 
(delimited with red line). 
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Fig. 7. Region to use decoder for Bazzi et al.'s optimized rate 1/2 LDPC code with noisy Gallager A decoding (black) is contained witliin 
the region to use decoder for a rate 1/2 LDPC code in Bazzi et al.'s optimal family of codes with a = 1/10 (green) and contains the region 
to use decoder for the C°°{3, 6) code (gray). 

C. Code Optimization 

At this point, the bit error performance of a system has simply been measured; no attempt has been made to 
optimize a code for a particular decoder and set of parameters. For fault-free decoding, it has been demonstrated 
that irregular code ensembles can perform much better than regular code ensembles like the (3,6) LDPC considered 
above 11741 . fTTl . One might hope for similar improvements when LDPC code design takes decoder noise into 
account. The space of system parameters to be considered for noisy decoders is much larger than for noiseless 
decoders. 

As a first step, consider the ensemble of rate 1/2 LDPC codes that were optimized by Bazzi et al. for the 
fault-free Gallager A decoding algorithm |f74| . The left degree distribution is 

A(C) = ae + (1 - 

and the right degree distribution is 

where the optimal a is specified analytically. Numerically, Oopt = 0.1115.... Measuring the performance of this 
code with the noisy Gallager A decoder yields the region to use decoder shown in Fig. |7l the region to use decoder 
for the (3,6) code is shown for comparison. By essentially any criterion of performance, this optimized code is 
better than the (3,6) code. 

Are there other codes that can perform better on the faulty decoder than the code optimized for the fault- 
free decoder? To see whether this is possible, arbitrarily restrict to the family of ensembles that were found to 
contain the optimal degree distribution for the fault-free decoder and take a = 1/10. Also let a = 1/500 be fixed. 
The numerical value of the threshold £^^^^(1/10, a) = 0.048239, whereas the numerical value of the threshold 
e*^pj(l/10, a) = 0.047857. In this sense, the a = 1/10 code is better than the a = Oopt code. In fact, as seen in 
Fig. |7J the region to use decoder for this a = 1/10 code contains the region to use decoder for the Oopt code. 

On the other hand, the final error probability when operating at threshold for the a = 1/10 code ry* ,^^(0, £^^^^(1/10, 
0.01869, whereas the final error probability when operating at threshold for the a = Copt code is r/*^^^ (q, e*^^^ (1/10, q)) 
0.01766. So in this sense, the a — lopt code is better than the a — 1/10 code. The fact that highly optimized 
ensembles usually lead to more simultaneous critical points is the main complication. 

If both threshold and final bit error probability are performance criteria, there is no total order on codes and 
therefore there may be no notion of an optimal code. 

VI. Example: Noisy Gaussian Decoder 

It is also of interest to analyze a noisy version of the beUef propagation decoder applied to the output of a 
continuous-alphabet channel. Density evolution for belief propagation is difficult to analyze even in the noiseless 
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decoder case, and so a Gaussian approximation method 1781 is used. The state variables are one-dimensional rather 
than infinite-dimensional as for full analysis of belief propagation. The specific node computations carried out by 
the decoder are as in belief propagation [13|; these can be approximated by the functions $ and defined below. 
The messages and noise model are specified in terms of the approximation. 

Section rvlhad considered decoding the output of a BSC with a decoder that was constructed with BSC components 
and Proposition [T] had shown that probability of bit error could never be driven to zero. Here, the probability of 
bit error does in fact go to zero. 

Consider a binary input AWGN channel with variance e^. The output is decoded using a noisy Gaussian decoder. 
For simplicity, only regular LDPC codes are considered. The messages that are passed in this decoder are real-valued, 
A4 = MU{iboo}, and are in belief format. 

The variable-to-check messages in the zeroth iteration are the log-likelihood ratios computed from the channel 
output symbols, iy{y), 

f^^c = v{y) = log — -. 

p{y\x = -1) 

The check node takes the received versions of these messages, ^v-s>c> as input. The node implements a mapping 
$ whose output, z^c^v> satisfies: 

dc-l 

etanh(i/c-^v) = etanh(^v-+c,), 

i=l 

where the product is taken over messages on all incoming edges except the one on which the message will be 
outgoing, and 



etanh(t;) = —j= \ tanh 



-e 4" dv. 
2 



The check node mapping is motivated by Gaussian likelihood computations. For the sequel, it is useful to define 
a slightly different function 

I 1 — etanhf-y), i) > 
1, u = 



which can be approximated as 



with a = -0.4527, h = 0.0218, c = 0.86 fm . 

For iterations £ > 1, the variable node takes the received versions of the c — v messages, ^c-)-v, as inputs. The 
mapping ^ yields output u^^c given by 

dv-l 
1=1 

where the sum is taken over received messages from the neighboring check nodes except the one to which this 
message is outgoing. Again, the operation of the variable node is motivated by Gaussian likelihood computations. 

As in Section |Vl local computation noise is combined into message -passing noise (Fig. |3]l. To model quantization 
ESI or random phenomena, consider each message passed in the decoder to be corrupted by signal-independent 
additive noise which is bounded as —a/2 < w < a/2. This class of noise models includes uniform noise, and 
truncated Gaussian noise, among others. If the noise is symmetric, then Theorem [T] applies. Following the von 
Neumann error model, each noise realization w is assumed to be independent. 



A. Density Evolution Equation 

The definition of the computation rules and the noise model may be used to derive the approximate density 
evolution equation. The one-dimensional state variable chosen to be tracked is s, the mean belief at a variable 
node. The symmetry condition relating mean belief to belief variance |[T3l . fTSl is enforced. Thus, if the all-one 
codeword was transmitted, then the value s going to +oo implies that the density of v^^c tends to a "mass point 
at infinity," which in turn implies that Pg goes to 0. 
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To bound decoding performance under any noise model in the class of additive bounded noise, consider (non- 
stochastic) worst-case noise. Assuming that the all-one codeword was sent, all messages should be as positive as 
possible to move towards the correct decoded codeword (mean beliefs of +00 indicate perfect confidence in a bit 
being 1). Consequently, the worst bounded noise that may be imposed is to subtract a/2 from all messages that 
are passed; this requires knowledge of the transmitted codeword being all-one. If another codeword is transmitted, 
then certain messages would have a/2 added instead of subtracted. 

Such a worst-case noise model does not meet the conditions of Theorem [TJ but transmission of the all-one 
codeword is assumed nonetheless. If there were an adversary with knowledge of the transmitted codeword imposing 
worst-case noise on the decoder, then probability of bit error would be conditionally independent of the transmitted 
codeword, as given in Appendix lA-ll 

Note that the adversary is restricted to selecting each noise realization independently. More complicated and 
devious error patterns in space or in time are not possible in the von Neumann error model. Moreover, the 
performance criterion is probability of bit error rather than probability of block error, so complicated error patterns 
would provide no great benefit to the adversary. 

Since the noise is conditionally deterministic given the transmitted codeword, derivation of the density evolution 
equation is much simplified. An induction argument is used, and the base case is 

2 

So - 

where is the channel noise power. This follows from the log-likelihood computation for an AWGN communication 
channel with input alphabet X = {±1}. 

The inductive assumption in the induction argument is se-i. This message is communicated over message-passing 
noise to get 

se-i - 2- 

Next the check node computation is made to yield 

(i-[i--^(s,_i-fr-^). 

By the inductive assumption, all messages will be equivalent; that is why the product is a {dc — l)-fold product of 
the same quantity. This value is communicated over message -passing noise to get 

(l-[l--/.(s,_i-f)]*-i) -f. 

Finally the variable-node computation yields 

so + K - 1) [r' (1 - [1 - Hse-i - f - f } . 

Again, all messages will be equivalent so the sum is a [d^ — l)-fold sum of the same quantity. Thus the density 
evolution equation is 

si = ^- + K - 1) {r' (1 - [1 - (t^isi^i - } ■ (4) 

B. Performance Evaluation 

One might wonder whether there are sets of noise parameters a > and e > such that si — )• +00. Indeed 
there are, and there is a threshold phenomenon just like Chung et al. showed for a = fTSl . 

Proposition 2: Final error probability ?]* = for LDPC ensembles decoded using the noisy Gaussian system 
defined in Section |Vll for binary-input AWGN channels with noise level e < e*{a). 

Proof: Substituting s = +00 into dUl demonstrates that it is a stable fixed point. It may further be verified that 
the dynamical system proceeds toward that fixed point if e < e*{a). ■ 
Unlike Section IVl where the e*{r], a) thresholds could be evaluated analytically, only numerical evaluations of these 
e*(a) thresholds are possible. These are shown in Fig. [8] for three regular LDPC ensembles with rate 1/2, namely 
the (3,6) ensemble, the (4,8) ensemble, and the (5,10) ensemble. As can be observed, thresholds decrease smoothly 
as the decoder noise level increases. Moreover, the ordering of the codes remains the same for all levels of decoder 
noise depicted. Code optimization remains to be done. 
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Fig. 8. Thresholds for decoding the C°°(3, 6) code (triangle), the C°°(4, 8) code (quadrangle), and the C°°(5, 10) (pentangle), each with 
the noisy Gaussian approximation algorithm. Notice that the ordinate intercepts are ecflc/(3,6) = 0.8747, £^^(7 

(4,8) = 0.8323, and 

ecRu{5, 10) = 0.7910, 1,78, Table I]. 

The basic reason for the disparity between Propositions [2] and \T\ is that here, the noise is bounded whereas the 
messages are unbounded. Thus once the messages grow large, the noise has essentially no effect. To use a term 
from 167], once the decoder reaches the breakout value, noise cannot stop the decoder from achieving Shannon 
reliability. 

Perhaps a peak amplitude constraint on messages would provide a more realistic computation model, but the 
equivalent of Proposition [2] may not hold. Quantified data processing inequalities may provide insight into what 
forms of noise and message constraints are truly limiting |[34l . ||35]| . 

VII. Application: Reliable Memories Constructed from Unreliable Components 

In Section m complexity and reliability were cast as the primary limitations on practical decoding. By considering 
the design of fault masking techniques for memory systems, a communication problem beyond Fig. [T] both 
complexity and reliability may be explicitly constrained. Indeed, the problem of constructing reliable information 
storage devices from unreliable components is central to fault-tolerant computing, and determining the information 
storage capacity of such devices is a long-standing open problem |[79l . This problem is related to problems in 
distributed information storage fSOl and is intimately tied to the performance of codes under faulty decoding. The 
analysis techniques developed thus far may be used directly. 

In particular, one may construct a memory architecture with noisy registers and a noisy LDPC correcting network. 
At each time step, the correcting network decodes the register contents and restores them. The correcting network 
prevents the codeword stored in the registers from wandering too far away. Taylor and others have shown that there 
exist non-zero levels of component noisiness such that the LDPC-based construction achieves non-zero storage 
capacity ll54l . |[55l . |[63ll . Results as in Section |V] may be used to precisely characterize storage capacity. 

Before proceeding with an achievability result, requisite definitions and the problem statement are given |[54l . 

Definition 6: An elementary operation is any Boolean function of two binary operands. 

Definition 7: A system is considered to be constructed from components, which are devices that either perform 
one elementary operation or store one bit. 

Definition 8: The complexity x of ^ system is the number of components within the system. 

Definition 9: A memory system that stores k information bits is said to have an information storage capability 
of k. 

Definition 10: Consider a sequence of memories {Mi}, ordered according to their information storage capability 
i (bits). The sequence {Mj} is stable if it satisfies the following: 

1) For any k, must have 2^^ allowed inputs denoted {/fc.}, 1 < i < 2^^. 

2) A class of states, C{IkJ, is associated with each input 1^, of M^. The classes C{IkJ and C{Ik^) must be 
disjoint for all i / j and all k. 

3) The complexity of M^, x{Mk), must be bounded by 9k, where redundancy 6 is fixed for all k. 
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4) At £ = 0, let one of the inputs from {It-} be stored in each memory in the sequence of memories 
{Mi}, with no further inputs in times i > 0. Let /^^ denote the particular input stored in memory M^. 
Let Afe^(r) denote the probability that the state of does not belong to C(/fc.) at ^ = T and further let 
Pk^'^'^iT) = maxj Afc,(r). Then for any T > and 5 > 0, there must exist a k such that P^'^'^iT) < 6. 
The demarcation of classes of states is equivalent to demarcating decoding regions. 

Definition 11: The storage capacity, C, of memory is a number such that there exist stable memory sequences 
for all memory redundancy values 9 greater than 1/C 

Note that unlike channel capacity for the communication problem, there is no informational definition of storage 
capacity that is known to go with the operational definition. 

The basic problem then is to determine storage capacity, which is a measure of the circuit complexity required 
to achieve arbitrarily reliable information storage. The circuit complexity must be linear in blocklength, a property 
satisfied by systems with message -passing correcting networks for LDPC codes. 

Although Proposition [T] shows that Shannon reliability is not achievable for any noisy Gallager A decoder, the 
definition of stable information storage does not require this. By only requiring maintenance within a decoding 
region, the definition implies that either the contents of the memory may be read-out in coded form or equivalently 
that there is a noiseless output device that yields decoded information; call this noiseless output device the silver 
decoder. 

Consider the construction of a memory with noisy registers as storage elements. These registers are connected to 
a noisy Gallager A LDPC decoder (as described in Section |V]|, which takes the register values as inputs and stores 
its computational results back into the registers. To find the storage capacity of this construction, first compute the 
complexity (presupposing that the construction will yield a stable sequence of memories). 

The Gallager A check node operation is a {dc — 1) -input XOR gate, which may be constructed from dc — 2 
two-input XOR gates. A variable node determines whether its d^ — I inputs are all the same and then compares 
to the original received value. Let Dd, denote the complexity of this logic. The output of the comparison to the 
original received value is the value of the consensus view. One construction to implement the consensus logic is 
to OR together the outputs of a (dv — l)-input AND gate and a [d^ — l)-input AND gate with inverted inputs. 
This is then XORed with the stored value. Such a circuit can be implemented with 2{d^ — 2) + 2 components, so 
Dd, = 2d^ — 2. The storage is carried out in n registers. The total complexity of the memory M^, x(^A:)c"(dv,dc)' 
is 

x(Mfc)c.(rf,,rf^) = n(l + 24 - 2 + fiv(dc - 2)) = n(d^d^ - 1). 

The information storage capability is n times the rate of the code, R. The complexity of an irredundant memory 
with the same storage capability is X\x^ ~ Hence, the redundancy is 

X(^fc)c"(d.,dc) _ n{d^dc - 1) ^ (dydc - 1) 
Xirr„ P^ ~ 1 ~ d^/dc 

which is a constant. By |65, Lemma 3.22], the inequality almost holds with equality with high probability for large 
n. For the (3,6) regular LDPC code, the redundancy value is 34, so C = 1/34, if the construction does in fact 
yield stable memories. 

The conditions under which the memory is stable depends on the silver decoder. Since silver decoder complexity 
does not enter, maximum likelihood should be used. The Gallager lower bound to the ML decoding threshold for 
the (3, 6) regular LDPC code is e^^^ = 0.0914755 181. Table II]. Recall from Fig. |5] that the decoding threshold 
for Gallager A decoding is e|j^^ = 0.0394636562. 

If the probability of bit error for the correcting network in the memory stays within the decoding threshold of 
the silver decoder, then stability follows. Thus the question reduces to determining the sets of component noisiness 
levels (a, e) for which the decoding circuit achieves (r/ = e^^^^) -reliability. 

Consider a memory system where bits are stored in registers with probability of flipping at each time step. 
An LDPC codeword is stored in these registers; the probability of incoiTcct storage at the first time step is e. At 
each iteration, the variable node value from the correcting network is placed in the register. This stored value is 
used in the subsequent Gallager A variable node computation rather than a received value from the input pins. 
Suppose that the component noise values in the correcting network may be parameterized as in Section |Vl Then a 
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Fig. 9. For a memory system constructed with noisy registers and a (3, 6) LDPC Gallager A correcting network, tiie region £ft (delimited 
by black line) comprises the "region to use decoder" and its hypograph. 

slight modification of the analysis in Section |V] yields a density evolution equation 

Si+i = £2 - £2qa{s£) + (1 - £2)^0 (S£), 

where £2 = ^^(l — a^) + ar(l — S£). There is a "region to use decoder" for this system, just as in Section jV] If 
ar = a, this region is shown in Fig. |9j and is slightly smaller than the region in Fig. |5] Denote this region and 
its hypograph as D\. It follows that (r/ = e^^^) -reliability is achieved for D\. Since e^^^ -reliability is achievable, 
e^j^^-reliability is achievable by monotonicity. Thus the construction yields stable memories. 

Proposition 3: Let $H be the set of memory component noise pai^ameters (a, e) within the region to use decoder 
or its hypograph corresponding to a system with a Gallager A coiTccting network for the (3, 6) LDPC code, depicted 
in Fig. |9] Then a sequence of memories constructed from 9^-components have a storage capacity lower bounded 
as C > 1/34. 

This may be directly generalized for any choice of code ensemble as follows. 

Theorem 5: Let D\ be the (computable) set of memory component noise parameters (a,e) within the region to 
use decoder or its hypograph corresponding to a system with a Gallager A correcting network for the (A, p) LDPC 
code. Then a sequence of memories constructed from IH-components have a storage capacity lower bounded as 

l-X'{l)/p'il) 
- A'(l)p'(l) - 1 • 

The bound reduces to (1 — / dc) / {d^dc — 1) for regular codes. 

This theorem gives a precise achievability result that bounds storage capacity. It also implies a code ensemble 
optimization problem similar to the one in Section IV-CI The question of an optimal architecture for memory 
systems however remains open. 

VIII. Conclusions 

Loeliger et al. Q had observed that decoders are robust to nonidealities and noise in physical implementations, 
however they had noted that "the quantitative analysis of these effects is a challenging theoretical problem." This 
work has taken steps to address this challenge by characterizing robustness to decoder noise. 

The extension of the density evolution method to the case of faulty decoders allows a simplified means of 
asymptotic performance characterization. Results from this method show that in certain cases Shannon rehability is 
not achievable (Proposition [T]), whereas in other cases it is achievable (Proposition |2l). In either case, however, the 
degradation of a suitably defined decoding threshold is smooth with increasing decoder noise, whether in circuit 
nodes or circuit wires. Due to this smoothness, codes optimized for fault-free decoders do work well with faulty 
decoders, however optimization of codes for systems with faulty decoders remains to be studied. 

No attempt was made to apply fault masking methods to develop decoding algorithms with improved performance 
in the presence of noise. One approach might be to use coding within the decoder so as to reduce the values of a. Of 
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course, the within-decoder code would need to be decoded. There are also more direct circuit-oriented techniques 
that may be applied (ET], (^3). Following the concept of concatenated codes, concatenated decoders may also be 
promising. The basic idea of using a first (noiseless) decoder to coiTcct many errors and then a second (noiseless) 
decoder to clean things up was already present in 161], but it may be extended to the faulty decoder setting. 

Reducing power consumption in decoder circuits has been an active area of research |[37l . |[84l - ||90l . however 
power reduction often has the effect of increasing noise in the decoder |[9ll . The tradeoff developed between the 
quality of the communication channel and the quality of the decoder may provide guidelines for allocating resources 
in communication system design. 

Analysis of other decoding algorithms with other error models will presumably yield results similar to those 
obtained here. For greater generality, one might move beyond simple LDPC codes and consider arbitrary codes 
decoded with very general iterative decoding circuits |90| with suitable error models. An even more general model 
of computation such as a Turing machine or beyond [,92,1 does not seem to have an obvious, appropriate error 
model. 

Even just a bit of imagination provides numerous models of channel noise and circuit faults that may be 
investigated in the future to provide further insights into the fundamental limits of noisy communication and 
computing. 
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hosting my visit at EPFL. I also thank the anonymous reviewers, Sanjoy K. Mitter, G. David Forney, and Vivek 
K Goyal for assistance in improving the paper. Thanks also to Shashi Kiran Chilappagari for telling me about his 
work. 



Let X e be a codeword and let Y denote the corresponding channel output Y = xZ (where the notation 
means pointwise multiplication on length n vectors). Note that Z is equal to the channel output observation when 
X is all-one. The goal is to show that messages sent during the decoding process for cases when the received 
codeword is either xZ or x correspond. 

Let hi be an arbitrary variable node and let hj be one of its neighboring check nodes. Let v^j^ (y) and fi\^^ (y) 
denote the variable-to-check message from hi to hj at the respective terminals in iteration £, assuming received 
value y. Similarly, let i/j^ (y) and fi^^^ (y ) be the check-to- variable message from hj to hi at the respective terminal 
in iteration £ assuming received value y. 

By Definition |2j the channel is memory less binary-input output-symmetric and it may be modeled multipUcatively 

as 



where {Zt} is a sequence of i.i.d. random variables and t is the channel usage time. The validity of the multiplicative 
model is shown in [13, p. 605] and (65^. p. 184]. 

The proof proceeds by induction and so the base case is estabUshed first. By the multiplicative model ([5]l, 
^ij^(y) = Recalling that Xi S {±1}, by the variable node symmetry condition (Definition 3) which 

includes computation noise u^^\ it follows that i'^j\y) = f]^^(xz) = Xivf^\z,). 

Now take the wire noise wf^^ on the message from hi to hj into account. It is symmetric (Definition |5]l and so 
^fj\y) — implies a similar property for ix'^^ . In particular. 
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Appendix A 
Proof of Theorem [T] 



Yt = xtZt, 



(5) 




(6) 
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where the last step follows because Xi G {±1} and so it can be taken outside of E by Definition [51 when it is put 
back in for the wire noise. Now since Xj G {±1} and since the wire noise is symmetric about by Definition [51 
XiE{i''>j\z),Xiwf^^) will correspond to XiiJ,\^\z), in the sense that error event probabilities will be identical. 

Assume that //•^''(y) corresponds to Xi^\^ {2) for all pairs and some £ > as the inductive assumption. 
Let Mn^ be the set of all variable nodes that are connected to check node iij. Since x is a codeword, it satisfies 
the parity checks, and so IlfceA/ = 1- Then from the check node symmetry condition (Definition [3]l, vfi^'^\y) 
corresponds to Xji^j^^^ (z). Further, by the wire noise symmetry condition (Definition [51) and the same argument 
as for the base case, /^ji^^^(y) corresponds to a;j/i^^^^^(z). By invoking the variable node symmetry condition 

(Definition [31) again, it follows that vl^^'^\y) corresponds to Xiu\^^^\z) for all pairs. 

Thus by induction, all messages to and from variable node hi when y is received correspond to the product of 
Xi and the corresponding message when z is received. 

Both decoders proceed in correspondence and commit exactly the same number of errors. 

1 ) Worst-Case Noise: The same result with the same basic proof also holds when the wire noise operation H 
is symmetric but w is not symmetric stochastic, but is instead worst-case. The only essential modification is in ^ 
and the related part of the induction step. Since wire noise is dependent on Xi, it can be written as XiW. Thus, 

(0)/ X (0)^ 

^.-(^)(z),.(°)) 
(0)/ \ 

where step (a) follows because Xi G {±1} and so it can be taken outside of H by the symmetry property of H. 
Thus the two decoders will proceed in exact one-to-one correspondence, not just in probabilistic correspondence. 

Appendix B 
Proof of Theorem [2 

Prior to giving the proof of Theorem [2l a review of some definitions from probability theory |[93l and the 
Hoeffding-Azuma inequality are provided. 

Consider a measurable space {Q.,F) consisting of a sample space i7 and a cr-algebra F of subsets of Q. that 
contains the whole space and is closed under complementation and countable unions. A random variable is an 
J^-measurable function on J7. If there is a collection {Z^\^ G C) of random variables : $7 — M, then 

Z = a{Z^\j G C) 

is defined to be the smallest cr-algebra ^ on 17 such that each map {Zy\^ G C) is ^-measurable. 

Definition 12 (Filtration): Let {Ti} be a sequence of fi-algebras with respect to the same sample space Q,. These 
J^i are said to form a filtration if /"o ^ C • • • are ordered by refinement in the sense that each subset of U in 
J^i is also in Tj for i < j. Also Tq = {0, Q,}. 

Usually, {Ti} is the natural filtration Ti = (t(Zo, Zi, . . . , Zi) of some sequence of random variables (Zq, Zi, . . .), 
and then the knowledge about uj known at step i consists of the values Zq{uj), Zi{uj), . . . , Zi{oj). 

For a probability triple P), a version of the conditional expectation of a random variable Z given a cr- 

algebra is a random variable denoted E[Z\F]. Two versions of conditional expectation agree almost surely, but 
measure zero departures are not considered subsequently; one version is fixed as canonical. Conditional expectation 
given a measurable event G; is denoted £'[Z|(t(£)] and conditional expectation given a random variable W is 
denoted £;[Z|c7(Ty)]. 

Definition 13 (Martingale): Let C <Z ■ ■ ■ he a filtration on Q, and let Zq,Zi,... be a sequence of 
random variables on Q such that Zi is J^j-measurable. Then Zq, Zi, ... is a martingale with respect to the filtration 
J^oQJ^i^-- - ifE [Z,| = Zi^i. 

A generic way to construct a martingale is Doob's construction. 

Definition 14 (Doob Martingale): Let Fi • • • be a filtration on and let Z he. & random variable on 

0. Then the sequence of random variables Zq, Zi, . . . such that Zi = E [Z\Fi] is a Doob martingale. 
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Lemma 1 (Hoeffdins-Az.uma Inequality lUl^, l\94^ , /|95l/).- Let Zo,Zi,... be a martingale with respect to the 
filtration C G ■ ■ ■ such that for each i > 0, the following bounded difference condition is satisfied 

\Zi - < Qj, ai £ [0,oo). 

Then for all n > and any ^ > 0, 

Pr[|Z„-Zo| <2exp 



Now to the proof of Theorem |2l as noted before, it is an extension of |[T3l Theorem 2] or ll65l Theorem 4.94]. The 
basic idea is to construct a Doob martingale about the object of interest by revealing various randomly determined 
aspects in a filtration-refining manner. The first set of steps is used to reveal which code was chosen from the 
ensemble of codes; the nd^ edges in the bipartite graph are ordered in some arbitrary manner and exposed one by 
one. Then the n channel noise realizations are revealed. At this point the exact graph and the exact channel noise 
realizations encountered have been revealed. Now the decoder noise realizations must be revealed. There are n 
variable nodes, so the computation noise in each of them is revealed one by one. There are nd^ edges over which 
variable-to-check communication noise is manifested. Then there are nd^/d^ check nodes with computation noise, 
and finally there are ndy check-to-variable communication noises for one iteration of the algorithm. The decoder 
noise realizations are revealed for each iteration. At the beginning of the revelation process, the average (over choice 
of code, channel noise realization, and decoder noise realization) is known; after the m = {d^-\-2M^+l+l+idy / dc)n 
revelation steps, the exact system used is known. 

Recall that Z denotes the number of incorrect values held at the end of the ^th iteration for a particular 
{g, y, w, u) G Q.. Since (7 is a graph in the set of labeled bipartite factor graphs with variable node degree d^ 
and check node degree dc, G^{d^,dc)', ?/ is a particular input to the decoder, y G y^; w is a particular realization 
of the message-passing noise, w £ j\y['iid,n.^ n is a particular realization of the local computation noise, 

U e U^WdJd.)n^ jj^g g^^plg gp^gg 1^^ = g^{d^^ dc) X X A^S^'^v" X U^Wd,/d^)n_ 

In order to define random variables, first define the following exposure procedure. Suppose realizations of random 
quantities are exposed sequentially. First expose the d^n edges of the graph one at a time. At step i < d^n expose 
the particular check node socket which is connected to the ith variable node socket. Next, in the following n steps, 
expose the received values yi one at a time. Finally in the remaining {2d^ + l + dv/dc)in steps, expose the decoder 
noise values Ui and Wi that were encountered in all iterations up to iteration i. 

Let =j, < i < m, be a sequence of equivalence relations on the sample space Q ordered by refinement. Re- 
finement means that {g' ,y' ,w' ,u') =i {g" ,y" ,w" ,u") implies [g' ,y' ,w' ,u') =i_i {g" ,y" ,w" ,u"). The equivalence 
relations define equivalence classes such that {g' , y' , w' , u') =i [g" ,y" ,w" ,u") if and only if the realizations of 
random quantities revealed in the first i steps for both pairs is the same. 

Now, define a sequence of random variables Zq, Zi, . . . , Zm- Let the random variable Zq be Zq = \Z\ where 
the expectation is over the code choice, channel noise , and decoder noise. The remaining random variables Zi are 
constructed as conditional expectations given the measurable equivalence events {g' ,y' ,w' ,u') =i {g,y,w,u): 

Zi{g,y,w,u) = E [Z{g\y' ,w' ,u')\a{{g' ,y' ,w' ,u') =i {g,y,w,u))\ . 

Note that Zm = Z and that by construction Zq, Zi, . . . , Z^ is a Doob martingale. The filtration is understood to 
be the natural filtration of the random variables Zq, Zi, . . . , Z^. 
To use the Hoeffding-Azuma inequality to give bounds on 

Pr [|Z -E[Z]\> nd^e/2] = Pr [\Zm - Zo| > nd^t/2] , 

bounded difference conditions 

\Zi+i{g,y,w,u) - Zi{g,y,w,u)\ < Oi, i = 0, . . . ,m - 1 

need to be proved for suitable constants Oj that may depend on d^, dc, and £. 

For the steps where bipartite graph edges are exposed, it was shown in [13, p. 614] that 

\Zi+i{g,y,w,u) - Zi{g,y,w,u)\ < 8{d^dc)\ 0<i< nd^. 
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It was further shown in |[T3l p. 615] that for the steps when the channel outputs are revealed that 

\Zi+i{g,y,w,u) - Zi{g,y,w,u)\ < 2{d^dcY, nd^ < i < n(l + d^). (7) 

It remains to show that the inequality is also fulfilled for steps when decoder noise realizations are revealed. The 
bounding procedure is nearly identical to that which yields ([7]). When a node noise realization u is revealed, clearly 
only something whose directed neighborhood includes the node at which the noise u causes perturbations can be 
affected. Similarly, when an edge noise realization w is revealed, only something whose directed neighborhood 
includes the edge on which the noise w causes perturbations can be affected. In [ 13, p. 603], it is shown that the size 
of the directed neighborhood of depth 2i of the node n(n) associated with noise u is bounded as jA/"^^^^ | < 2{dydcY 
and similarly the size of the directed neighborhood of length 2£ of the edge e{w) associated with noise w is bounded 
as lA/lj-^^^l < 2{d^dcY. Since the maximum depth that can be affected by a noise perturbation is 21, a weak uniform 
bound for the remaining exposure steps is 

\Zi+i{g,y,w,u) - Zi{g,y,w,u)\ < 2{d^dcY, n(l + d^)d^ < i < m. 

Since bounded difference constants aj have been provided for all i, the theorem follows from application of the 
Hoeffding-Azuma inequality to the martingale. 

One may compute a particular value of /3 to use as follows. The bounded difference sum is 



^ = 6ind^{d^dcY^ + 4n{d^dcY^ + ^^id^n + ni + nid^/dcKd^dcf 



\2£ 

k=l 

= n [Ud, + 4 + + ^ + ^} dv^^dc^^ 
Setting constants in the theorem and in the Hoeffding-Azuma inequality equal yields 

i = 512(i.2£-i^^2^ ^ 32d.2^-2d,2^ + QUd^^'-'d^"' + 8^dv2£-i^^2^-i + 8£d.2^-24'' 

2£-l , 21 



< (544 + 80£)fiv d 



c 



Thus i can be taken as (544 + 80i)d^'^'^~^dc^*^. 

Appendix C 
An Analytical Expression 

An analytical expression for e* [r] = 1/10, a = 5 x 10^'^) is 

i (1 - vm^) , 

where cj is the second root of the polynomial in e 

ci + C2e + cse^ + c^e^ + c^e^ + c^i" , 
and constants {c\, . . . ,c%) are defined as follows. 

ci = ma^ - mOo? + 1860q^ - 6240a^ + 14752a^ - 25344a^ + 31680a^ 

- 28160a^ + 16896q^° - 6144a" + 10240^^ 
3424572914129280658801 



4000000000000000000000000 



C2 = 1 - 72a + 1080a^ - 8160a'' + 38640a^ - 125952a^ + 295424a^ - 506880a^ 
+ 633600a^ - 563200a^ + 337920a^° - 122880a" + 20480a^2 
133200752195329280658801 



200000000000000000000000 



C3 = 32 - 864a + 10080a2 - 69120a3 + 314880a'^ - 1012224a^ + 2364928a^ - 4055040a^ 

+ 5068800a^ - 4505600a^ + 2703360a^° - 983040a" + 163840a^^ 
_ 698088841835929280658801 
~ 25000000000000000000000 
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C4 = 160 - 3840a + 42240a^ - 281600q^ + 1267200a^ - 4055040a^ + 9461760a^ - 16220160a'^ 

+ 20275200a^ - 18022400a^ + 10813440a^° - 3932160a^^ + 655360a^2 
_ 886384871716129280658801 
~ 6250000000000000000000 

C5 = 320 - 7680a + 844800^ - 563200a^ + 2534400a^ - 8110080a^ + 18923520a^ - 32440320q^ 

+ 40550400a® - 36044800a^ + 21626880a^° - 7864320a^^ + 1310720a^2 
_ 886384871716129280658801 
~ 3125000000000000000000 

C6 = 256 - 6144a + 67584a2 - 450560a^ + 2027520a^ - 6488064a^ + 15138816a^ - 25952256a^ 
+ 32440320a® - 28835840a^ + 17301504a^° - 6291456a^^ + 1048576a^^ 
_ 886384871716129280658801 
~ 3906250000000000000000 

As given in Table ID the numerical value of e*{r] = 1/10, a = 5 x 10^^) is 0.0266099758. 
Similariy complicated analytical expressions are available for the other entries of Table U and the values used to 
create Figs. IH [51 and [6] 
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